首页> 外文OA文献 >An Iterative Scheme for the Approximate Linear Programming Solution to the Optimal Control of a Markov Decision Process
【2h】

An Iterative Scheme for the Approximate Linear Programming Solution to the Optimal Control of a Markov Decision Process

机译:马尔可夫决策过程最优控制的近似线性规划解的迭代方案

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper addresses the computational issues involved in the solution to an infinite-horizon optimal control problem for a Markov Decision Process (MDP) with a continuous state component and a discrete control input. The optimal Markov policy for the MDP can be determined based on the fixed point solution to the Bellman equation, which can be rephrased as a constrained Linear Program (LP) with an infinite number of constraints and an infinite dimensional optimization variable (the optimal value function). To compute an (approximate) solution to the LP, an iterative randomized scheme is proposed where the optimization variable is expressed as a linear combination of basis functions in a given class: at each iteration, the resulting semi-infinite LP is solved via constraint sampling, whereas the number of basis functions is progressively increased through the iterations so as to meet some performance goal. The effectiveness of the proposed scheme is shown on a multi-room heating system example.
机译:本文解决了具有连续状态分量和离散控制输入的马尔可夫决策过程(MDP)的无限水平最优控制问题解决方案所涉及的计算问题。可以基于Bellman方程的不动点解来确定MDP的最佳马尔可夫策略,可以将其改写为具有无限数量的约束和无限维优化变量的约束线性程序(LP)(最优值函数)。为了计算LP的(近似)解,提出了一种迭代随机方案,其中,优化变量表示为给定类中基函数的线性组合:在每次迭代中,通过约束采样来解决所得的半无限LP。 ,而基础函数的数量则通过迭代逐渐增加,以满足某些性能目标。所提出的方案的有效性在多房间供暖系统示例中得到了证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号